Learning Local Content Shift Detectors from Document-level Information

نویسنده

  • Richárd Farkas
چکیده

Information-oriented document labeling is a special document multi-labeling task where the target labels refer to a specific information instead of the topic of the whole document. These kind of tasks are usually solved by looking up indicator phrases and analyzing their local context to filter false positive matches. Here, we introduce an approach for machine learning local content shifters which detects irrelevant local contexts using just the original document-level training labels. We handle content shifters in general, instead of learning a particular language phenomenon detector (e.g. negation or hedging) and form a single system for document labeling and content shift detection. Our empirical results achieved 24% error reduction – compared to supervised baseline methods – on three document labeling tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Immune-based Approach to Document Classification

artificial immune system, document classification, machine learning, concept learning, coevolution The human immune system as a biological complex adaptive system has provided inspiration for a range of innovative problem solving techniques in areas such as computer security, knowledge management and information retrieval. In this paper the construction and performance of a novel immune-based l...

متن کامل

Hebbian learning and competition in the neural abstraction pyramid

The recently introduced Neural Abstraction Pyramid is a hierarchical neural architecture for image interpretation that is inspired by the principles of information processing found in the visual cortex. In this paper we present an unsupervised learning algorithm for it’s connectivity based on Hebbian weight updates and competition. The algorithm yields a sequence of feature detectors that produ...

متن کامل

A Light-weight Relevance Feedback Solution for Large Scale Content-Based Video Retrieval

This paper addresses the problem of large scale content-based video retrieval with relevance feedback. We analyze the common methods which leverage local feature detectors to extract feature descriptors from video collections and perform multi-level matching after indexing and retrieval of feature vectors. Instead of learning similarity-preserving codes, we introduce the relevance feedback appr...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Devising ethical codes for e-contents in e-learning

Background: Promoting ethics is one of the goals of education, but the free flow of communication and divulging unethical behaviors in e-learning make the urgent need to clarify ethical values. Therefore, the aim of this study was to prepare ethical codes to develop and deliver e-contents.    Methods: A draft of e-content ethical codes was prepared based on the literature review. Then, it was ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011